1 Introduction

The aim of this project is to find out if portfolios made of 15 randomly selected stocks from the S&P500 Large-Cap Index can outperform stock-picking experts and index investing. The portfolios were optimized based on the mean-variance framework developed by Harry Markowitz for simplicity, with the goal of maximizing the ex-post Sharpe Ratio during the optimization period.

The project is meant to be a fun/thought experiment, and is not meant to disprove of index or active investing. It does not make use of any statistical methods to prove that random selection, expertise or index investing is a better choice.

2 Packages Required

library(PortfolioAnalytics) # For portfolio optimization and analysis
library(RColorBrewer) # For color palettes in plots
library(ROI) # For ROI solver in portfolio optimization
library(ROI.plugin.glpk) # Part of the ROI solver for linear optimization
library(ROI.plugin.quadprog) #Part of the ROI solver for quadratic optimization
library(tidyquant) # For quantmod and PerformanceAnalytics functions
library(tidyverse) # For dplyr and ggplot2 functions (data manipulation and plotting)

3 Retrieve Index Components and Price Data

3.1 Index Components

The components of S&P500 can be obtained using tidyquant::tq_index().

# Only require ticker and maybe name of company, which is columns 1 and 2 of query result

sp500 <- tidyquant::tq_index(x = "SP500")[,1:2]

sp500
tickers <- sp500$symbol

3.2 Price Data

Using quantmod::getSymbols(), I retrieved the daily adjusted closing prices of the tickers from Yahoo Finance, starting from 1 January 2015 to 1 July 2022.

startdate <- as.Date("2015-01-01")
enddate <- as.Date("2022-07-01")

prices <- NULL

for (t in tickers) {   # Use for loop to get prices of all tickers
   
   tmp <- NULL
  
   tmp <- try(   # Use try function to skip loop and return error message when price cannot be retrieved
       expr = {quantmod::getSymbols(Symbols = t, src = "yahoo", auto.assign = FALSE, 
                                    from = startdate, to = enddate, periodicity = "daily")[,6]},
       silent = TRUE
   )
   
   if(inherits(tmp, "try-error")) {cat("ERROR:", t, sep = " "); next}
   
   names(tmp) <- t
   
   prices <- cbind(prices, tmp)
}
## ERROR: BRK.BERROR: BF.B
# Check dimension of object, start and end date of data collected
dim(prices); start(prices); end(prices)
## [1] 1887  502
## [1] "2015-01-02"
## [1] "2022-06-30"

The download has failed for BRK.B (Berkshire Hathaway) and BF.B (Brown-Forman Corporation) because the tickers in Yahoo Finance are BRK-B and BF-B respectively. I used the correct tickers to retrieve their data and added them to prices. I also checked which stocks had NA over the specified period and removed them from prices.

prices <- cbind(quantmod::getSymbols(Symbols = "BRK-B", src = "yahoo", auto.assign = F, 
                                     from = startdate, to = enddate, periodicity = "daily")[,6],
                quantmod::getSymbols(Symbols = "BF-B", src = "yahoo", auto.assign = F, 
                                     from = startdate, to = enddate, periodicity = "daily")[,6],
                prices)

colnames(prices)[1:2] <- c("BRK-B", "BF-B")

# Check which stocks have NA over the specified period
data.frame(rbind(which(colSums(is.na(prices)) > 0)), row.names = "Number of NAs")
# Keep only stocks that do not have NAs over the specified period
prices <- prices[, which(colSums(is.na(prices)) == 0)]

dim(prices) # Left with 482 stocks
## [1] 1887  482

3.3 Sample Tickers for Portfolios

15 tickers were sampled from prices without replacement each time, which will be used to create 10 portfolios at the end. The set.seed() allows the random sample to be reproducible.

samp_stocks <- list()

for (i in 1:10) {
  set.seed((50+i)^2)
  
  samp_stocks[[i]] <- prices[, sample(x = ncol(prices), size = 15, replace = FALSE)]
  
  names(samp_stocks)[i] <- paste("S", i, "_prices", sep = "")
}

# Check stocks sampled into each portfolio
data.frame(sapply(samp_stocks, colnames), row.names = 1:15)

4 Calculate Daily Stock Returns and Optimize Portfolios

4.1 Calculate Returns

Discrete returns are used as it can be added across assets, unlike log-returns. Returns can be calculated using PerformanceAnalytics::Return.calculate() but since samp_stocks is a list object, the function needs to be used together with lapply().

stock_returns <- lapply(samp_stocks, function(x) {
  na.omit(PerformanceAnalytics::Return.calculate(x, method = "discrete"))
})

names(stock_returns) <- paste("S", c(seq(1:10)), "_returns", sep = "")

data.frame(sapply(stock_returns, dim), row.names = c("Number of rows", "Number of columns"))
# Show first 3 elements in stock_returns, first 5 columns and first 4 rows of data
lapply(stock_returns[1:2], function(x) {
  head(x[,1:5], n = 4)
})
## $S1_returns
##                      WY            V        AMGN          NOC        SCHW
## 2015-01-05  0.000000000 -0.022073753 -0.01188338 -0.021097814 -0.03342164
## 2015-01-06 -0.001107650 -0.006443594 -0.03221695  0.005510299 -0.03663123
## 2015-01-07  0.003049679  0.013398154  0.03492451  0.031631465  0.01954536
## 2015-01-08  0.010779294  0.013412803 -0.00360209  0.023197606  0.02614153
## 
## $S2_returns
##                      DG          MCD         MOS         SBAC         CINF
## 2015-01-05 -0.006498952 -0.011044559 -0.01791585 -0.015769986 -0.014338451
## 2015-01-06 -0.012656687  0.001843226  0.01334830 -0.005218771 -0.007666568
## 2015-01-07  0.012098692  0.017424265  0.00526882  0.013713724  0.013668776
## 2015-01-08 -0.010815288  0.003723105  0.01179311  0.009987207  0.021301508

4.2 Create Portfolio Specification

Using PortfolioAnalytics::portfolio.spec(), portfolio specification for the mean-variance framework can be created. Because each portfolio contains different assets, we need to create 10 specifications for optimization.

portspecs <- list()

for (s in 1:10) {
  # Initialize portfolio specification
  portspecs[[s]] <- PortfolioAnalytics::portfolio.spec(assets = names(stock_returns[[s]]))
  
  # Sum of weights constrained to 1, can also specify as type = "full investment"
  portspecs[[s]] <- PortfolioAnalytics::add.constraint(portspecs[[s]], 
                                                       type = "weight_sum",
                                                       min_sum= 1, max_sum = 1)

  # Weight constraint on each stock, max is 15% of portfolio
  portspecs[[s]] <- PortfolioAnalytics::add.constraint(portspecs[[s]], 
                                                       type="box", 
                                                       min=0, max=0.15)
  
  # Objective to minimize risk based on variance (function will default to standard deviation as measure of risk)
  portspecs[[s]] <- PortfolioAnalytics::add.objective(portspecs[[s]],
                                                      type = "risk",
                                                      name = "var")
  
  # Adding a return objective, thus we maximize mean return per unit of risk
  portspecs[[s]] <- PortfolioAnalytics::add.objective(portspecs[[s]],
                                                      type = "return",
                                                      name = "mean")
  
  # Add name to specification to distinguish them
  names(portspecs)[s] <- paste("P", s, "_spec", sep = "")
}

# Example of what portspecs contains
portspecs$P1_spec
## **************************************************
## PortfolioAnalytics Portfolio Specification 
## **************************************************
## 
## Call:
## PortfolioAnalytics::portfolio.spec(assets = names(stock_returns[[s]]))
## 
## Number of assets: 15 
## Asset Names
##  [1] "WY"   "V"    "AMGN" "NOC"  "SCHW" "CPRT" "DFS"  "MCHP" "BF-B" "CINF"
## More than 10 assets, only printing the first 10
## 
## Constraints
## Enabled constraint types
##      - weight_sum 
##      - box 
## 
## Objectives:
## Enabled objective names
##      - var 
##      - mean

4.3 Optimize Portfolios

The portfolios are optimized with PortfolioAnalytics::optimize.portfolio.rebalancing(), which re-optimizes the portfolios every specified period. The main arguments are rebalance_on, training_period and rolling_window. We can decide the frequency of rebalancing (“months”, “quarters”, and “years”), the number of periods used for optimization (specify an integer matching frequency of data, in this case daily) and the number of periods to roll the window used for optimization. I also added the argument maxSR = TRUE to maximize the Sharpe Ratio of the portfolios (the demo can be found here). The function PortfolioAnalytics::optimize.portfolio() only allows for single-period optimization.

optports <- list()

for (p in 1:10) {
  optports[[p]] <- PortfolioAnalytics::optimize.portfolio.rebalancing(R = stock_returns[[p]], 
                                                                      portfolio = portspecs[[p]], 
                                                                      optimize_method = "ROI", 
                                                                      rebalance_on = "quarters", 
                                                                      training_period = 252, 
                                                                      rolling_window = 125,
                                                                      maxSR = TRUE)
  
  names(optports)[p] <- paste("P", p, "_optimal", sep = "")
}

# Example of what optports contains
optports$P1_optimal
## **************************************************
## PortfolioAnalytics Optimization with Rebalancing
## **************************************************
## 
## Call:
## PortfolioAnalytics::optimize.portfolio.rebalancing(R = stock_returns[[p]], 
##     portfolio = portspecs[[p]], optimize_method = "ROI", maxSR = TRUE, 
##     rebalance_on = "quarters", training_period = 252, rolling_window = 125)
## 
## Number of rebalancing dates:  26 
## First rebalance date:
## [1] "2016-03-31"
## Last rebalance date:
## [1] "2022-06-30"
## 
## Annualized Portfolio Rebalancing Return:
## [1] 0.164621
## 
## Annualized Portfolio Standard Deviation:
## [1] 0.206573

I used 252 periods for training/optimization and rolled the window forward every 125 days, so the portfolios are re-optimized based on previous 1 year (252 trading days) data every half-year (125 trading days) and re-balanced every quarter.

4.4 Plot Weights Over Time

Since the portfolio was re-optimized every half-year, the optimal weights selected using the mean-variance framework with a target of maximizing ex-post Sharpe Ratio should change over time. The weights of the Portfolio 1 are plotted below.

chart.Weights(object = optports[[1]], 
              colorset = c(RColorBrewer::brewer.pal(n = 7, name = "Dark2"), RColorBrewer::brewer.pal(n = 8, name = "Accent")), 
              main = "Portfolio 1 Component Weights")

5 Calculate Daily Portfolio Returns

5.1 Extract Weights

To calculate daily portfolio returns, the daily weights of the portfolio are required and can be extracted from the optimal portfolios using PortfolioAnalytics::extractWeights().

port_weights <- lapply(optports, FUN = PortfolioAnalytics::extractWeights)

names(port_weights) <- paste("P", c(seq(1:10)), "_weights", sep = "")

# Show first 3 elements in stock_returns, first 5 columns and first 4 rows of data each
lapply(port_weights[1:3], function(x) {
  head(x[,1:5], n = 4)
})
## $P1_weights
##                       WY             V          AMGN  NOC          SCHW
## 2016-03-31  2.498051e-02  1.356383e-16  3.739019e-17 0.15 -7.108513e-17
## 2016-06-30 -1.280954e-17  5.705632e-17 -2.210002e-17 0.15  1.664727e-16
## 2016-09-30  1.288686e-16 -2.789032e-17  7.170911e-04 0.15  5.654058e-17
## 2016-12-30 -1.669187e-17  1.544942e-17 -1.372749e-17 0.00  1.500000e-01
## 
## $P2_weights
##                       DG           MCD           MOS          SBAC         CINF
## 2016-03-31  1.500000e-01  1.500000e-01 -9.226609e-18 -1.203994e-18 1.500000e-01
## 2016-06-30  1.500000e-01  2.711126e-16 -4.044221e-18 -1.985114e-18 1.500000e-01
## 2016-09-30 -2.852867e-18 -5.927237e-18 -2.150322e-17  9.359765e-17 1.500000e-01
## 2016-12-30  1.678395e-17  3.777998e-02 -2.580449e-18 -4.229156e-17 2.581569e-18
## 
## $P3_weights
##                       GS          PAYX          REGN      LRCX          CHRW
## 2016-03-31 -4.559622e-17  1.500000e-01 -6.344878e-18 0.1500000  1.500000e-01
## 2016-06-30  9.257308e-17  1.500000e-01 -2.345520e-19 0.0999999  1.500000e-01
## 2016-09-30  1.573552e-16  1.500000e-01 -6.485471e-17 0.1096971 -1.372825e-16
## 2016-12-30  1.500000e-01 -1.080069e-16  8.916761e-18 0.1500000 -9.512732e-17

5.2 Calculate Returns

PerformanceAnalytics::Return.portfolio() can be used to calculate the daily portfolio returns.

port_returns <- NULL

for (r in 1:10) {
  port_returns <- cbind(port_returns,
                        PerformanceAnalytics::Return.portfolio(R = stock_returns[[r]], 
                                                               weights = port_weights[[r]], 
                                                               geometric = TRUE))
  
  names(port_returns)[r] <- paste("P", r, "_returns", sep = "")
}

data.frame(head(port_returns))

6 Evaluate Performance Against Benchmarks

6.1 Benchmarks

Since the aim of this project was to determine if portfolios of random stocks could outperform stock-picking experts and index investing, relevant proxies for benchmarking are required.

The proxies are returns calculated from the Daily Adjusted Closing Prices of popular index or actively-managed Exchange-Traded Funds that have sizeable Total Assets Under Management screened from etfdb.com. The actively-managed ETFs were filtered to have an inception date before 01 January 2015.

The index ETFs chosen are:

  1. SPDR S&P 500 ETF Trust (SPY)
  2. Vanguard Total Stock Market ETF (VTI)
  3. Invesco QQQ Trust (QQQ)
  4. iShares Russell 2000 ETF (IWM)
  5. iShares Core S&P Small-Cap ETF (IJR)
  6. iShares Core S&P Mid-Cap ETF (IJH)

The actively-managed ETFs chosen are:

  1. ARK Innovation ETF (ARKK)
  2. ARK Genomic Revolution ETF (ARKG)
  3. First Trust North American Energy Infrastructure Fund (EMLP)
  4. ARK Next Generation Internet ETF (ARKW)
  5. ARK Autonomous Technology & Robotics ETF (ARKQ)
index_tickers <- c("SPY", "VTI", "QQQ", "IWM", "IJR", "IJH")

index_prices <- NULL

for (i in index_tickers) {
  index_prices <- cbind(index_prices,
                        quantmod::getSymbols(Symbols = i, src = "yahoo", auto.assign = FALSE, 
                                             from = startdate, to = enddate, periodicity = "daily")[,6])
}

colnames(index_prices) <- index_tickers

active_tickers <- c("ARKK", "ARKG", "EMLP", "ARKW", "ARKQ")

active_prices <- NULL

for (i in active_tickers) {
  active_prices <- cbind(active_prices,
                         quantmod::getSymbols(Symbols = i, src = "yahoo", auto.assign = FALSE, 
                                             from = startdate, to = enddate, periodicity = "daily")[,6])
}

colnames(active_prices) <- active_tickers

6.2 Calculate Benchmark Daily Returns

The discrete method was also applied to the calculation of benchmark returns for comparison of the cumulative return performance.

index_returns <- na.omit(PerformanceAnalytics::Return.calculate(prices = index_prices, method = "discrete"))

data.frame(head(index_returns, 4))
active_returns <- na.omit(PerformanceAnalytics::Return.calculate(prices = active_prices, method = "discrete"))

data.frame(head(active_returns, 4))

6.3 Comparison Against Index Investing

6.3.1 Cumulative Returns

PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,1:5], index_returns), 
                                       geometric = TRUE,
                                       main = "Cumulative Returns of Portfolios and Index ETFs", 
                                       colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"), 
                                       plot.engine = "plotly")
PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,6:10], index_returns),
                                       geometric = TRUE,
                                       main = "Cumulative Returns of Portfolios and Index ETFs",
                                       colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
                                       plot.engine = "plotly")
From the two charts, the QQQ ETF that tracks the NASDAQ-100 Index, which has a heavy focus on the technology sector, had higher cumulative returns than the other Index ETFs. The only portfolio that managed to outperform the QQQ ETF is Portfolio 2, which is made up of

6.3.2 Drawdowns

PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,1:5], index_returns), 
                                     geometric = TRUE,
                                     main = "Drawdowns of Portfolios and Index ETFs", 
                                     colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"), 
                                     plot.engine = "plotly")
PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,6:10], index_returns),
                                     geometric = TRUE,
                                     main = "Drawdowns of Portfolios and Index ETFs",
                                     colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
                                     plot.engine = "plotly")

In general, the IWM, IJH and IJR ETFs have larger drawdowns, since they track the small-cap and mid-cap companies in the U.S., which tends to be more volatile. The portfolios created from 10 randomly selected stocks had similar drawdowns to the ETFs.

6.3.3 Performance Tables

table.AnnualizedReturns(R = cbind(port_returns, index_returns), scale = 252, Rf = 0.025/252, geometric = TRUE, digits = 4)
table.CAPM(Ra = port_returns, Rb = index_returns$SPY, scale = 252, Rf = 0.025/252, digits = 4)
table.DownsideRisk(R = cbind(port_returns, index_returns), scale = 252, Rf = 0.025/252, MAR = 0.07/252, digits = 4)

For the metrics that require risk-free rate (Rf), I chose an annual rate of 2.5%. For the Minimum Acceptable Return (MAR) in calculating Downside Deviation, I used an annual rate of 7%, which is approximately the long-run inflation-adjusted return of the S&P500. Portfolio 2 had the highest annualized return and Sharpe Ratio and the lowest maximum drawdown.

6.4 Comparison Against Actively-Managed ETFs

6.4.1 Cumulative Returns

PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,1:5], active_returns), 
                                       geometric = TRUE,
                                       main = "Cumulative Returns of Portfolios and Active ETFs", 
                                       colorset = RColorBrewer::brewer.pal(n = 10, name = "Spectral"), 
                                       plot.engine = "plotly")
PerformanceAnalytics::chart.CumReturns(R = cbind(port_returns[,6:10], active_returns),
                                       geometric = TRUE,
                                       main = "Cumulative Returns of Portfolios and Active ETFs",
                                       colorset = RColorBrewer::brewer.pal(n = 10, name = "Spectral"),
                                       plot.engine = "plotly")

The ARK ETFs (ARKK, ARKG, ARKW, ARKQ) had high cumulative returns during the 2020 to 2021 period, but were the most volatile as well. Portfolio 2, which did the best against Index ETFs, had better cumulative returns in 2022 than the ARK ETFs.

6.4.2 Drawdowns

PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,1:5], active_returns), 
                                     geometric = TRUE,
                                     main = "Drawdowns of Portfolios and Active ETFs", 
                                     colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"), 
                                     plot.engine = "plotly")
PerformanceAnalytics::chart.Drawdown(R = cbind(port_returns[,6:10], active_returns),
                                     geometric = TRUE,
                                     main = "Drawdowns of Portfolios and Active ETFs",
                                     colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
                                     plot.engine = "plotly")

The ARK ETFs had the largest drawdowns since 2021. In comparison, the EMLP ETF and portfolios made of random stocks had lower drawdowns.

6.3.3 Performance Tables

table.AnnualizedReturns(R = cbind(port_returns, active_returns), scale = 252, Rf = 0.025/252, geometric = TRUE, digits = 4)
table.CAPM(Ra = cbind(port_returns, active_returns), Rb = index_returns$SPY, scale = 252, Rf = 0.025/252, digits = 4)
table.DownsideRisk(R = cbind(port_returns, active_returns), scale = 252, Rf = 0.025/252, MAR = 0.07/252, digits = 4)

The benchmark returns (Rb) in table.CAPM() used SPY returns to give a comparison of the alpha and beta of the portfolios and actively-managed ETFs. Again, Portfolio 2 had the highest annualized return and Sharpe Ratio and the lowest maximum drawdown. The ARK ETFs would have outperformed Portfolio 2 if an investor bought at the start of 2015 and sold in early 2021 at its peak.

7 Final Remarks

Some caveats about this experiment:

  1. I had assumed that the stocks in the S&P500 Index at the time of writing were also in the S&P500 Index in 2015.
  2. I had assumed that the 15 stocks sampled for each portfolio would not be swapped for other stocks within the experiment period.
  3. I had assumed zero transaction costs.

I would like to stress that this project was mainly carried out as a fun/thought experiment. It was not meant to disprove the expertise of active managers, the strategies employed by these funds for stock selection or the power of investing into an index ETF. From the results of this simple experiment, only 1 portfolio out of 10 (Portfolio 2) managed to achieve greater cumulative returns over the years than the rest of the pack. There would be an incredible amount of luck involved to select 15 stocks from the S&P500 universe of stocks that can provide such returns, if they were to be chosen at random.